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Abstract — The deletion channel is the simplest point-to-point 
communication channel that models lack of synchronization. 
Despite significant effort, little is known about its capacity, and 
even less about optimal coding schemes. In this paper we initiate a 
new systematic approach to this problem, by demonstrating that 
capacity can be computed in a series expansion for small deletion 
probability. We compute two leading terms of this expansion, and 
show that capacity is achieved, up to this order, by i.i.d. uniform 
random distribution of the input. 

We think that this strategy can be useful in a number of 
capacity calculations. 

I. Introduction 

The (binary) deletion channel accepts bits as inputs, and 
deletes each transmitted bit independently with probability 
d. Computing or providing systematic approximations to its 
capacity is one of the outstanding problems in information 
theory [1|. An important motivation comes from the need to 
understand synchronization errors and optimal ways to cope 
with them. 

In this paper we suggest a new approach. We demonstrate 
that capacity can be computed in a series expansion for small 
deletion probability, by computing the first two orders of such 
an expansion. Our main result is the following. 

Theorem 1.1. Let C(d) be the capacity of the deletion channel 
with deletion probability d. Then, for small d and any e > 0, 



C(d) = 1 + d log d - Ai d + 0(d 3/2 - e ) , 



(1) 



where A\ = log(2e) — YmZi 2 Hlogl. Further, the iid 
Bernoulli{\/2) process achieves capacity up to corrections of 
order 0(d 3/2 ~ e ). 

Logarithms here (and in the rest of the paper) are understood 
to be in base 2. The constant A\ can be easily evaluated to 
yield A\ w 1.154163765. While one might be skeptical about 
the concrete meaning of asymptotic expansions of the type 
(G3, they often prove surprisingly accurate. For instance at 10% 
deletion probability, Eq. ([TJ is off the best lower bound proved 
in by about 0.010 bits. More importantly they provide 
useful design insight. For instance, the above result shows that 
Bernoulli (1/2) is an excellent starting point for the optimal 
input distribution. Next terms in expansion indicate how to 
systematically modify the input distribution for d > 1121 . 

We think the strategy adopted here might be useful in other 
information theory problems. The underlying philosophy is 
that whenever capacity is known for a specific value of the 




Fig. 1. Comparison of the asymptotic formula {T} (continuous line) with 
upper bounds from [6 ] (stars *) and lower bounds from |5 ] (squares, □). The 
0(d?^~ £ ) term in (T} was simply dropped. 



channel parameter, and the corresponding optimal input dis- 
tribution is unique and well characterized, it should be possible 
to compute an asymptotic expansion around that value. Here 
the special channel is the perfect channel, i.e. the deletion 
channel with deletion probability d = 0. The corresponding 
input distribution is the iid Bernoulli (1/2) process. 

A. Related work 

Dobrushin [3| proved a coding theorem for the deletion 
channel, and other channels with synchronization errors. He 
showed that the maximum rate of reliable communication is 
given by the maximal mutual information per bit, and proved 
that this can be achieved through a random coding scheme. 
This characterization has so far found limited use in proving 
concrete estimates. An important exception is provided by the 
work of Kirsch and Drinea [4| who use Dobrushin coding 
theorem to prove lower bounds on the capacity of channels 
with deletions and duplications. We will also use Dobrushin 
theorem in a crucial way, although most of our effort will be 
devoted to proving upper bounds on the capacity. 

Several capacity bounds have been developed over the last 
few years, following alternative approaches, and are surveyed 
in (T). In particular, it has been proved that C(d) = 0(1 — d) 
as d — > 1. However determining the asymptotic behavior in 
this limit (i.e. finding a constant B\ such that C(d) = B\(l — 
d) + o(l — d)) is an open problem. When applied to the small 
d regime, none of the known upper bounds actually captures 
the correct behavior (Q~|). As we show in the present paper, this 



behavior can be controlled exactly. 

When this paper was nearing submission, a preprint by 
Kalai, Mitzenmacher and Sudan [7 1 was posted online, proving 
a statement analogous to Theorem 11.11 The result of Q is 
however not the same as in Theorem lI.il only the dlog d term 
of the series is proved in [7]. Further, the two proofs are based 
on very different approaches. 

II. Preliminaries 

For the reader's convenience, we restate here some known 
results that we will use extensively, along with with some 
definitions and auxiliary lemmas. 

Consider a sequence of channels {W n } n >i, where W n 
allows exactly n inputs bits, and deletes each bit independently 
with probability d. The output of W n for input X n is a binary 
vector denoted by Y(X n ). The length of Y(X n ) is a binomial 
random variable. We want to find maximum rate at which 
we can send information over this sequence of channels with 
vanishingly small error probability. 

The following characterization follows from [3]. 

Theorem II.l. Let 

C n = -maxI(X n ;Y(X n )). (2) 

n px" 

Then, the following limit exists 

C = lim C n = inf C n , (3) 

n^oo n>l 

and is equal to the capacity of the deletion channel. 

Proof: This is just a reformulation of Theorem 1 in (3), 
to which we add the remark C = inf n >i C n , which is of 
independent interest. In order to prove this fact, consider the 
channel W m+n , and let X m+n = (X^X™!™) be its input. 
The channel W m + n can be realized as follows. First the input 
is passed through a channel W m + n that introduces deletions 
independently in the two strings X™ and X™^™ and outputs 
Y(X™ +n ) = (y(A7 l ),|,r(X™+™)) where | is a marker. 
Then the marker is removed. 

This construction proves that W m + n is physically degraded 
with respect to W m + n , whence 

(m + n)C m+n < max I(X m+n ; Y{X[ n+n )) 

< mC m + nC„ . 

Here the last inequality follows from the fact that W m + n is 
the product of two independent channels, and hence the mutual 
information is maximized by a product input distribution. 

Therefore the sequence {nC rl }„>i is sub-additive, and the 
claim follows from Fekete's lemma. ■ 

A last useful remark is that, in computing capacity, we can 
assume (X\, . . . ,X n ) to be n consecutive coordinates of a 
stationary ergodic process. 

Lemma II.2. Let X = {Xi\i^% be a stationary and ergodic 
process, with Xi taking values in {0, 1}. Then the limit /(X) = 
lim^oo ±I(X n ;F(X n )) exists and 

C= max /(X). (4) 

X stat. erg. 



Proof: Take any stationary X, and let /„ = 
I(X n ;Y(X n )). Notice that F(Xf) - XJ 1 - X"+f - 
Y{X n l +™) form a Markov chain. Define Y(X n+m ) as 
in the proof of Theorem III. II As before we have 
In+m < I(X n+m ,Y(X n+m )) < I(X[ n ;Y(Xr)) + 
I(X™+™;y(X™+™)) = I m +I n - (the last identity follows 
by stationarity of X). Thus I m +n < /„ + I m and the limit 
linin^oc I n /n exists by Fekete's lemma, and is equal to 
inf„>i I n /n. 

Clearly, /„ < C n for all n. Fix any e > 0. We will construct 
a process X such that 

I N /N>C-e VJV>JV (e), (5) 

thus proving our claim. 

Fix n such that C n > C — e/2. Construct X with iid blocks 
of length n with common distribution p*(n) that achieves 
the supremum in the definition of C„ . In order to make this 
process stationary, we make the first complete block to the 
right of the position start at position s uniformly random in 
{1,2,..., n). We call the position s the offset. The resulting 
process is clearly stationary and ergodic. 

Now consider N = kn + r for some k E N and 
r G {0, 1, . . . , n — 1}. The vector X^ contains at least k — 1 
complete blocks of size n, call them X(l), X(2), . . . , X(fc— 1) 
with X(i) ~ p*(n). The block X(l) starts at position ,s. There 
will be further r + n — s + 1 bits at the end, so that X^ — 
(Xr\X(l),X(2), . . . ,X(fc-l),Xj%J. Abusing notation, 
we write Y(i) for Y(X(i)). Given the output Y, we define 
Y = (Y(Xr 1 )\Y(l)\Y(2)\ . . . \Y(k - l)|Y(X£ (fc _ 1} J), by 
introducing k synchronization symbols |. There are at most 
(n+l) k possibilities for Y given Y (corresponding to potential 
placements of synchronization symbols). Therefore we have 

H(Y) = H(Y) - H(Y\Y) 

>H(Y)-log((n + l) k ) 

> (ft-l)#(Y(l))-fclog(n + l), 

where we used the fact that the (X(i), y(i))'s are iid. Further 

H(Y\X N ) < H{Y\X N ) <{k- l)H{Y(l)\X{l)) + 2n , 

where the last term accounts for bits outside the blocks. We 
conclude that 

I(X N ;Y(X N )) = H(Y) - H(Y\X N ) 

>(k- l)nC n - k log(n + 1) - 2n 
> N(C n - e/2) , 

provided log(n + l)/n < e/10, N > N = lOn/e. Since 
C n > C — e/2, this in turn implies Eq. ©. ■ 

III. Proof of the main theorem: Outline 

In this section we provide the proof of Theorem [T7T] We 
defer the proof of several technical lemmas to the next section. 

The first step consists in proving achievability by estimating 
/(X) for the iid Bernoulli (1/2) process. 



Lemma III.l. Let X* be the iid Bernoulli(l/2) process. For 
any e > 0, we have 

I(X*) = l + d\ogd- A 1 d + 0{d 2 - f ). (6) 

Lemma III. 21 allows us to restrict our attention to stationary 
ergodic processes in proving the converse. In light of Lemma 
IIH. II we can further restrict consideration to processes X 
satisfying /(X) > 1 + 2d log d and hence H(X) > l + 2dlogd 
(here and below, for a process X, we denote by iJ(X) its 
entropy rate). 

Given a (possibly infinite) binary sequence, a run of 0's (of 
l's) is a maximal subsequence of consecutive 0's (l's), i.e. 
an subsequence of 0's bordered by l's (respectively, of l's 
bordered by 0's). Denote by S the set of all stationary ergodic 
processes and by Sl the set of stationary ergodic processes 
such that, with probability one, no run has length larger than L. 
The next lemma shows that we don't lose much by restricting 
ourselves to Sl* for large enough L* . 

Lemma III.2. For any e > there exists do = do(e) > such 
that the following happens for all d < do. For any X G S such 
that i?(X) > 1 + 2d\ogd and for any L* > log(l/d), there 
exists X^* G Sl* such that 



/(X) < I(K L .) + d 1 / 2 -^*)- 1 logL* 



(7) 



We are left with the problem of bounding /(X) from above 
for all X £ Sl* ■ The next lemma establishes such a bound. 

Lemma III.3. For any e > there exists do = do(e) > such 
that the following happens. For any L* G N and any X G Sl* 
if d < do(e), then 



/(X) < 1 + dlogd- Aid + d 2 - £ (l + d 1/2 L*). 



(8) 



Proof of Theorem Lemma IIH. II shows achievability. 
The converse follows from Lemmas llH.21 and IIH. 31 with L* = 
[1/dJ. ■ 

IV. Proofs of the Lemmas 

In Section llV-AI we characterize any stationary ergodic X in 
terms of its 'bit perspective' and 'block perspective' run-length 
distributions, and show that these distributions must be close 
to the distributions obtained for the iid Bernoulli (1/2) process. 
In Section ITV-BI we construct a modified deletion process that 
allows accurate estimation of H(Y\X n ) in the small d limit. 
Finally, in Section IIV-CI we present proofs of the Lemmas 
quoted in Section [Hi] using the tools developed. 

We will often write X h a for the random vector 
(X a , X a+ i, . . . , Xb) where the Xi's are distributed according 
to the process X. 

A. Characterization in terms of runs 

Consider a stationary ergodic process X. Without loss of 
generality we can assume that almost surely all runs have finite 
length (by ergodicity and stationarity this only excludes the 
constant and constant 1 processes). Let Lo be the length of 
the run containing position in X. Let L\ be the length of first 
run to occur to the right of position in X and, in general, 



let Li be the length of the i-th run to the right of position 
0. Let pl,x denote the limit of the empirical distribution of 
Li, L2, ■ ■ ■ , Lie, as K — >• 00. By ergodicity pl,x is a well 
defined probability distribution on N. We call pl,x the block- 
perspective run length distribution for obvious reasons, and 
use L to denote a random variable drawn according to pl.x- 
It is not hard to see that, for any I > 1, 



\L = I) 



IPL,x( 1 ) 

E[L] 



(9) 



In other words Lq is distributed according to the size biased 
version of pl,x- We call this the bit perspective run length 
distribution, and shall often drop the subscript X when clear 
from the context. Notice that since Lq is a well defined and 
almost surely finite, we have E[L] < 00. It follows that the 
empirical distribution of run lengths in X" also converges to 
Pl,x almost surely, since the first and last run do not matter 
in the limit. 

If Lq , Lx> ■ - ■ > Lx are the run lengths in the block Xq , it 
is clear that H(Xq) < 1 + H(Lx, • • • , L^ n , K n ) (where one 
bit is needed to remove the 0, 1 ambiguity). By ergodicity 
K n /n — > 1/E[L] almost surely as n — > 00. This also implies 
H(K n )/n — > 0. Further, \imsup n _ j . 00 H(Li, . . . , LK n )/n < 
limn-Kx, H(L)K n /n = H(L)/E[L}. If H(X) is the entropy 
rate of the process X, by taking the n — > 00 limit, it is easy 
to deduce that 



H{X) < 



HjL) 
E[L] ' 



(10) 



with equality if and only if X consists of iid runs with common 
distribution pl- 

For convenience of notation, define /i(X) = E[L]. We know 
that given E[L] = n, the probability distribution with largest 
possible entropy H(L) is geometric with mean /i, i.e. pl{1) = 
(1 - l/ii) 1 - 1 !/^ for all I > 1, leading to 

H(L) , 1 , , , L 1, 1 , . . . 

^> < _ 1 _ _ log 1 - - - -log- EE h (lH . 

(11) 

Here we introduced the notation h(p) = —plogp — (1 — 
p) log(l — p) for the binary entropy function. 

In light of Lemma llll.il we can restrict ourselves to H(X) > 
1 + 2 d log d. Using this, we are able to obtain sharp bounds 
on pl and /x(X). 



Lemma IV. 1. There exists do > such that, for any 
with H(X)>1 + 2d log d, 



\n(X) - 2| < v/l00dlog(l/d). 



G S 



(12) 



for all d < do. 

Proof: By Eqs. (TU) and (TT), we have h(l/n) > 1 + 
2d log d. By Pinsker's inequality h(p) < l-(l-2p) 2 /(21n2), 
and therefore |1 - (2/^)| 2 < (41n2)dlog(l/d). The claim 
follows from simple calculus. ■ 



Lemma IV.2. There exists K' < oo and do > such that, for 
any X € S with H(K) > 1 + 2dlogd, and any d < do, 



E 

1=1 



Pl(1) 



< A'Vdlog(l/d). 



(13) 



Proo/- Let p£(Z) = 1/2', Z > 1 and recall that /z(X) 
^] = Si>iPi(0'- An explicit calculation yields 



H(p L ) = f,(X) - D(p L \\p* L ) . 
Now, by Pinsker's inequality, 

D(pl\\pI) > ^\\PL - pI\\ 2 tv 



(14) 



(15) 



Combining Lemma H V. 1 1 and Eqs. ( HOI , (Q and ( fTBT l, we get 
the desired result. ■ 

Lemma IV.3. There exists K" < oo and da > rac/i f/zaf, 
/or any X £ 5 wz'Z/z H(H) > 1 + 2cZlogd, and any d < do, 

I 



E 

i=i 



\L = I) 



2 i+i 



< K"^d(\og{l/d)f . 



(16) 



Pl(1) - 



<K'y/dlog(l/d), 



(17) 



Proof: Let Z = |_- log (IT y/d\og{l/d))\. It follows 
from Lemma IIV.2I that 

E 

1=1 

which in turn implies 

lo lo — l 7 

2=0 2=0 

Summing the geometric series, we find that there exists a 
constant K\ < oo such that 

oo , 



£ £ = (Z + 1)2 1 - / " < WMl/d)) 3 . 



(19) 



2 = 2n 



Using the identity ^2=0 ^ — 2, together with Eqs. (fT8T l 
and ( fT9] l, we get 



£/p L (0>2-A' 1A /d(log(l/d))3. 



(20) 



2=0 



Combining this result with Lemma H V. 1 1 we conclude (even- 
tually enlarging the constant Ki) 



IPl(!) < 2K 1 ^d(log(l/d)) 3 

2=2o + l 

Using this result together with Eq. JT9] l, we get 

I 



(21) 



E IMO-^l<4AVdO°g(l/<9) 3 - ( 22 ) 

2=2o + l 

From a direct application of Lemma IIV.2I it follows that 
there exists a constant K2 < 00, such that 



I 



2 = 1 



ENo-jt < ^ 2 vrf(iog(i/d)) 2 



(23) 



and therefore summing Eqs. d23l l and ( f22l 

El Zpl(Q i 
I 2 

2=1 



2 2+l 



<2{K 1 +K 2 )^d{\og{l/d)f. (24) 



We know that P(L = I) = lp L (l)/fi(X). The proof is 
completed by using Eq. (l24l and bounding /i(X) with the 
Lemma llVTI ■ 

B. A modified deletion process 

We define an auxiliary sequence of channels W n whose 
output -denoted by Y(X n )- is obtained by modifying the 
deletion channel output in the following way. If an 'extended 
run' (i.e. a run TZ along with one additional bit at each end 
of 1Z) undergoes more than one deletion under the deletion 
channel, then 1Z will experience no deletion in channel W n , 
i.e. the corresponding bits are present in Y(X n ). Note that 
(deletions in) the additional bits at the ends are not affected. 

Formally, we construct this sequence of channels as follows 
when the input is a stationary process X. Let P be an iid 
Bernoulli (cZ) process, independent of X, with D™ being the 
n-bit vector that contains a 1 if and only if the corresponding 
bit in X n is deleted by the channel W n . We define D(D, X) to 
be the process containing a subset of the Is in B. The process 
B is obtained by deterministically flipping some of the Is in B 
as described above, simultaneously for all runs. The output of 
the channel W n is simply defined by deleting from X n those 
bits whose positions correspond to Is in B. 

Notice that (X, B, B) are jointly stationary. The sequence 
of channels W n are defined by B, and the coupled sequence 
of channels W n are defined by B. We emphasize that B is a 
function of (X, B). Let Z = B©B (where © is componentwise 
sum modulo 2). The process Z is stationary with P(Zo = 1) = 
z = E[eJ- d{l - d) La+1 ] < 2d 2 E[L }. Note that z = 0{d 2 ) 
for E[L ] = 0(1). 

The following lemma shows the utility of the modified 
deletion process. 

Lemma IV.4. Consider any X s S such that E[io log Lq] < 
00. Then 

lim -H(D n \X n ,Y n ) =dE{\ogL ] -S, (25) 

n— ¥00 ji 

where < 5 = 5(d,X) < 2d 2 E[L log L ]. 

Proof: Fix a channel input x n and any possible output 
y = y(x n ) (i.e. an output that occurs with positive probability 
under W n ). The proof consists in estimating (the logarithm 
of) the number of realizations of D n that might lead to the 
input/ouput pair (x n ,y), and then taking the expectation over 
{x n ,y). 

Proceeding from left to right, and using the constraint on 
B, we can map unambiguously each run in y to one or more 
runs in x n , that gave rise to it through the deletion process. 
Consider a run of length i in y. If there is a unique 'parent' 
run, it must have length £ or £+1. If the length of the parent 
run is I, then no deletion occurred in this run, and hence 
the contribution to H(D n \x n , y) of such runs vanishes. If the 



length of the parent run is I + 1, one bit was deleted by W n 
and each of the £+1 possibilities is equally likely, leading to 
a contribution log(£ + 1) to H(D n \x n , y). 

Finally, if there are multiple parent runs of lengths 
li, I2, • • • , h, they must be separated by single bits of taking 
the opposite value in x n , all of which were deleted. It also 
must be the case that £ i=1 h — £ i-e- there is no ambiguity 
in D n . This also implies li < £. 

Notice that the three cases described corresponds to three 
different lengths for the run in y. This allows us to sequentially 
associate runs in y with runs in x n , as claimed. 

By the above argument, H{D n \x n , y"') = ^2 re x>^°&(^ r ) 
where T> is the set of runs on which deletions did occur, and 
£ r are their lengths. Using the definition of D, the sum can 
be expressed as Yl7=i l°s(^(i))> w i tri %) ^ e length of the 
run containing the i-ih bit. Using the definition of D, we get 
P(A = 1) = d{l-d) e w +1 e {d-(£ (i) + l)d 2 ,d) (except for 
the last and first block in x n , that can be disregarded). Taking 
expectation and letting n-Joowe get the claim. ■ 

Corollary IV.5. Under the assumptions of the last Lemma, 
and denoting by h(p) the binary entropy function, we have 

lim -H(Y(X n )\X n ) = hid) - dE[logL ] +S, 

n— >og ft 

where -2h{z) <6 = 5{d,X) < 2d 2 E[L Q \ogL ]+2h(z) and 
z = d-E[d(l -d) Lo+1 }. 

Proof: By definition, D n is independent of X n . We have, 

for Y = Y(X n ), 

H(Y\X n ) = H(D n \X n ) - H(D n \X n ,Y) 
= nh(d) - H(D n \X n , Y) + n5i , 

with |<5i(d,X)| < 2H{Z n )/n < 2h(z). In the second equality 
we used the fact that the pairs ((X n ,Y,D n ), (X n ,Y,D n )) 
and {(X n , Y), (X n , Y)) are both of the form (A, B) such that 
A is a function of (B,Z n ) and B is a function of (A,Z n ), 
=> \H{A) - H(B)\ < H(Z n ). M 

C. Proofs of Lemmas I77T7] WL2\ and WU\ 

Proof of Lemma \III.H Clearly, X* has run length 
distribution p L (l) = 2~ l , I > 1. Moreover, Y(X*> n ) is also 
a iid Bernoulli(l/2) string of length ~ Binomial(n, 1 — d). 
Hence, H(Y) = n(l — d) + 0(\ogn). We now use the estimate 
of H(Y\X*> n ) from Corollary |IV5] We have z = 0{d 2 ) and 
E[iologio] < 00, leading to 

H(Y\X*' n ) = n(h(d) - dE[\ogL Q ] + 0(d 2 - f -)) + o(n) . 

Computing H(Y) - H(Y\X*> n ), we get the claim. ■ 
Proof of Lemma WL2\ We construct by flipping a 
bit each time it is the (L* + l)-th consecutive bit with the 
same value (either or 1). The density of such bits in X 
is upper bounded by a = P(Xo > L*)/L*. The expected 
fraction of bits in the channel output Yl* = F(X£„) that 
have been flipped relative to Y = Y(X n ) (output of the same 
channel realization with different input) is also at most a. Let 
F = F(X, D) be the binary vector having the same length as 



Y, with a 1 wherever the corresponding bit in Yl* is flipped 
relative to Y, and Os elsewhere. The expected fraction of l's 
in F is at most a. Therefore 

H(F) < n(l - d)h{a) + log(n + 1) . (26) 

Notice that Y is a deterministic function of (Yj> , F) and Yj,* 
is a deterministic function of (Y, F), whence 

\H(Y)-H(Y L ,)\<H(F). (27) 

Further, X-X L » -X£» -Y L * form a Markov chain, and X L », 
X£, are deterministic functions of X. Hence, H(Yl* X£„) = 
H{Y L ,\X). Similarly, H(Y\X n ) = H(Y\X). Therefore (the 
second step is analogous to Eq. (|27| >) 

\H(Y L . \Xl.) - H(Y\X n )\ = (28) 
= \H{Y L ,\X) - H{Y\X)\ < H(F) . 

It follows from Lemma IIV.3I and L* > log(l/d) that 
a < 2K" 'V 'rf(log(l / 'd)) 3 1 'L* for sufficiently small d. Hence, 
h(a) < d 1 / 2 - e logL*/(2L*) for d < d (e), for some 
do(e) > 0. The result follows by combining Eqs. ( f26b . ( 1271 ) 
and (|28) to bound |7(X) - I(X L .)\. ■ 
Proof of Lemma 17713} If H(X) < 1 + 2d log d, we 
are done. Else we proceed as follows. We know that Y(X n ) 
contains Binomial(n, 1 — d) bits, leading immediately to 

H(Y) < n(l - d) + log(n + 1) . (29) 

We use the lower bound on H(Y\X n ) from Corollary II V. 5 1 We 
have z < 2 d 2 E[L a }. It fo llows from Lemma |IV3] that E[L ] < 
Jfi(l + y/d(log(l/d)) 3 L*), leading to h(z) < .5d 2 ~ f (l + 
(l/2)d 1 / 2 L*) for all d < d , where d Q = d (e) > 0. Thus, 
we have the bound 

lim -H(Y\X n ) > h(d) - dE[logL ] - d 2 ~ e (l + .5d 1/2 L*) 

n—>oc ti 

Using LemmaHV3l we have |E[log£ ]-£j=i 2~ l - 1 llogl\ = 
( d (i/2)- £ \ og L*y The result follows. ■ 
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